Skip to content

Conversation

@dgovil
Copy link
Contributor

@dgovil dgovil commented Mar 26, 2025

Description of Proposal

This proposal is an update to the prior Language proposal, which tries to incorporate much of the feedback since the initial proposal.

It also tries to incorporate some of the use cases that have arisen since then, as well as tries to aggregate common patterns with other localization formats.

Link to Rendered Proposal

The primary change is that localizations have been moved from the prims themselves to a separate localization catalog.
This reduces a few of the concerns folks had including:

  1. Storing a lot of potentially unused attributes on a prim, and having to parse all the prim identifiers to build the translation map. These are now stored external to the prim.
  2. Knowing which prims are translated. There is now a metadata flag for it
  3. Translation of non attribute types shown in UIs like Variant Sets and Variants, as they can now also be translated with this configuration
  4. Doubling up of data if two attributes share the same string

I believe this is a more elegant system as a result of that feedback loop.

Contributing

@spiffmon
Copy link
Member

This new direction looks promising, @dgovil ! We have several suggestions that we think further simplify, and foster more efficient evaluation - since we actually do want to keep a future OpenExec implementation in mind.

Firstly, just a couple things to verify/clarify, that enables one of our suggestions, but is of concern even in the current proposal:

  • given a string-to-be-localized like "There's a snake in my boot" , we do not believe the translation should be allowed to change over time, correct?
  • Can we clarify that this mechanism is not intended for use with long strings (like embedding movie dialog into scene description)? The comparison/hashing time for long strings seems like a source of strange and potentially frustrating behavior.

Catalog Encoding

The separation of behavior that does inherit down namespace and that which doesn't seems logical. In the interests of data locality (and survival under referencing), and efficient evaluation, we'd like to propose changing the proposed LocalizationCatalog/LocalizedString schemas into a single LocalizationCatalogAPI that simply provides a dictionary-valued extension metadatum localizationCatalog. (We say "provides" acknowledging that metadata can't be scoped to particular schemas in the data model itself, but stipulating that a client is not expected to look for the dictionary unless the API is applied on the prim).

localizationCatalog would be a two-level nested dictionary, where each primary entry key would be a language specifier, and the keys of each nested dictionary would be the "to be localized" texts, such as "There's a snake in my boot", with the value being the localized string for the language specifier. However, if it seems like it would be easier or more resonant for users to flip the nesting (which more closely matches the current proposal, and I see now other standard encodings as mentioned?), that would work just as well. Either way, we benefit from the "sparse composability" of dictionaries in USD to allow stronger layers to add new entries or partial entries into existing catalogs. The greatest benefit here, though, is that once (simply) deserialized, this dictionary is already in exactly the form we need to do efficient lookups.

Inheritance Override/Merge of Catalogs?

The current proposal mentions namespace inheritance of catalog binding, such that closest-binding-to-prim wins. We can match that behavior by stipulating that the closest ancestor with LocalizationCatalogAPI applied wins and provides the complete catalog to be used by descendants, or we could say that all ancestors' catalogs merge, with closest ancestor's content strongest. That would be a new inheritance semantic (though similar to that of SemanticsLabelsAPI), so not to be adopted lightly since it does add more complexity, but worth considering if it would provide considerable value.

localized Metadatum

Do we really need this? Can presence of LocalizationAPI on a prim not be a sufficient indicator that any uiHints:displayName, uiHints:displayGroup, variant/variantSet (though see below), or string-valued-attribute's value? For one thing, it seems a bit counter-purpose to gather all the localizations centrally so that they can be shared by multiple prims/properties, but then still require you to annotate each and every consuming instance individually? This is just a question to see how strongly you feel this is beneficial.

Variants and VariantSets

We're a bit concerned about allowing localization directly on scene object identifiers like variant and variantSet names, for the potential for inadvertent confusion and mis-authoring. We would propose instead tying this aspiration to the proposed displayName and variantDisplayNames additions proposed in the Variant Set Metadata proposal. And this aspiration would provide grist for implementing (after updating for uiHints) the proposal :-)

Thanks for pushing this forward; it seems like a valuable addition.

@dgovil
Copy link
Contributor Author

dgovil commented Aug 20, 2025

Hey Spiff, thanks for the in-depth response.

Regarding the first couple questions:

  1. I think the localization itself would not be time varying BUT there is the chance that the source string to be localized might be time varying. I don't have a specific use for that myself though other than perhaps accessibility data. In that case the localization would be static to the input string, but would vary along with it in time? Again, just thinking hypothetically.
  2. I think long strings might come up, although they'd be rare. I'm thinking of the use of something like a New York Times supplemental piece or museum infographic that might have diegetic text in the future. I do think they'd be rare though...

Catalog Encoding
I think having a dictionary of {language: {source: translation}} would make sense, and as you mention, sparse composition would mean we'd keep the benefits of letting people do sparse additions. I always forget that dictionaries have that ability.

I don't have a strong preference for whether the language identifier is foremost or the string itself. I think there's probably a benefit of having the language first, so you can just keep the parts of the dictionary you need.

I do think people might want to (in the future) also apply localization to asset paths like audio and textures. I don't think we need to tackle that any time soon, but I think it's worth considering while making the catalog encoding. But I think a dictionary would scale fine there too.

Inheritance Override/Merge of Catalogs?
My one concern was if someone referenced in a prim with a partial catalog onto a prim halfway up the hierarchy, causing the translation to break , but I don't think that will be a common case.
I think taking your suggestion of closest ancestor with no merging makes sense for performance reasons, and someone could reference in the full catalog if they need it if they accidentally break things.

localized Metadatum
I don't think we need it. My thinking was that you could prevent unnecessary lookups, but I think maybe it's premature optimization on my part and we can skip it for now.

One thing that someone did mention is that HTML allows for the inverse <p translate="no">Something Important that we shouldn't trust a robot to auto-translate</p> (https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Global_attributes/translate) but I feel like that's also something we could add in the future and don't need to bake in the first version.

Variants and VariantSets
Oh perfect, yes, if we could get display names for variants, that is really the thing I want to translate. Our use case is that we use USDZ with variant sets on the root prim as a configurator and currently display the variant identifier with some string shenanigans. But if we could use a display name there it would let us do more interesting things and have fewer workarounds.

@dgovil
Copy link
Contributor Author

dgovil commented Aug 20, 2025

Oh but on the topic of long strings it just occurred to me that only the lookup string length should matter, not the final translated string length.

In that case , yes, the lookups can be short. Often we'd just put some kind of identifier in like "LOC_Level1_Desc" and then have the actual long string be part of the language catalog.

I think as long as we provide guidance, short strings should be fine

@spiffmon
Copy link
Member

That all sounds great, and that kind of planned/structured use of long strings is clever and sounds great. Regarding timevarying, yes that's exactly what I was thinking, also - the translations are entirely value-based, and shouldn't care what timeCode the value was extracted from.

@spiffmon spiffmon moved this from Todo to Draft in OpenUSD Proposals Status Sep 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Draft

Development

Successfully merging this pull request may close these issues.

2 participants